-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reuse known fileids and cache data in the scanner #10993
Conversation
Found an even bigger improvement, now the cache data itself is being reused, bringing the optimal case (cache is up to date) to 3 queries per folder, down from 5 per folder and 3 per file. In my local environment this brings scanning the ~44k files down to ~32k queries and 190sec compared to the original 160k queries and 374s |
3883d96
to
5724c12
Compare
Another round of optimization later and we're down to 1 query per folder in the best case, resulting in a total of 14k in my test cases vs 160k before optimazation |
5724c12
to
ea69cd1
Compare
It sounds like a cool improvement, but I'm worried that we will run into situations like #10954 (comment) more often if we cache too much. Or we need a way to auto-clean older cache entries while the scanning is in progress. |
If you have the folder structure:
As soon as scanning of |
This actually uses (slightly) less memory as before, this re-uses objects which were already kept in memory instead of requesting another copy of the object from the database |
@DeepDiver1975 @PVince81 @th3fallen can this be merged? |
@@ -104,9 +104,11 @@ public function getData($path) { | |||
* | |||
* @param string $file | |||
* @param int $reuseExisting | |||
* @param int $parentId | |||
* @param array |null $cacheData |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cacheData of the parent or the file to be scanned?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added documentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mhm - I don't see this clarified yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weird, added it now
Maybe after the release, when there will be more time to review OC 8 stuff... |
7caf1c5
to
64e26d3
Compare
fixes #8548 |
75be54a
to
268c5cb
Compare
$folderId = $this->cache->getId($path); | ||
} | ||
$existingChildren = $this->getExistingChildren($folderId); | ||
$newChildren = $this->getNewChildren($path); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain this addition ? Does it mean we previously rescanned all children, not only the new ones ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$newChildren
here means the children as reported by the storage, as opposed to the children reported by the cache ($existingChildren
)
The code looks good but I don't understand it 100%. |
@@ -194,6 +194,9 @@ public function getFolderContentsById($fileId) { | |||
$file['size'] = $file['unencrypted_size']; | |||
} | |||
$file['permissions'] = (int)$file['permissions']; | |||
$file['mtime'] = (int)$file['mtime']; | |||
$file['storage_mtime'] = (int)$file['storage_mtime']; | |||
$file['size'] = (int)$file['size']; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this lead to problems on 32bit platforms?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@icewind1991 what about this comment ?
@icewind1991 Want to rebase this and made it ready for review or is this something for after 8.0? |
c3d950f
to
5862e4f
Compare
|
Please rebase + squash and fix the unit tests, if applicable 😄 |
3c1b6e6
to
ddcd9ce
Compare
@DeepDiver1975 @PVince81 @MorrisJobke merge? |
b31ad2b
to
9df18ff
Compare
squashed and fixed the 32bit int issue |
The inspection completed: 5 new issues, 2 updated code elements |
Refer to this link for build results (access rights to CI server needed): |
Tested against SFTP ext storage with remote changes, scanning still works correctly 👍 |
Tested 👍 |
Reuse known fileids and cache data in the scanner
Regression found: #14169 |
Saves having to do multiple
getId
calls for every file and folder being scanned.Saves about
50%80%90% of the database class made during a filescancc @PVince81 @DeepDiver1975 @th3fallen